-
Notifications
You must be signed in to change notification settings - Fork 395
Forward Step Functions Tags with logs to the backend #618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Verified again after the scripts are updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM to me overall, left a few questions.
aws/logs_monitoring/cache.py
Outdated
| ####################### | ||
|
|
||
|
|
||
| class StepFunctionsTagsCache(LambdaTagsCache): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we have three caches, it might be a good idea to split these Cache classes out into their own files.
aws/logs_monitoring/cache.py
Outdated
| get_resources_paginator = resource_tagging_client.get_paginator("get_resources") | ||
|
|
||
| try: | ||
| for page in get_resources_paginator.paginate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so I understand, we are following the lambda approach of prefetching the tags from each StepFunction, not the Cloudwatch Log group approach which doesn't fetch anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
aws/logs_monitoring/cache.py
Outdated
| Returns: | ||
| state_machine_tags (List[str]): the list of "key:value" Datadog tag strings | ||
| """ | ||
| if self._is_expired(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious. So, for a specific execution_arn, its tags are keeping changing? What kind of change can it have? Or once its tags are not null, there will be no change for these tags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tags are cached for 300 seconds. Every 300 seconds, forwarder will refetch all tag for state machines. Does this answer your question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kimi-p My concern is not about TTL. I am just curious about what kind of change can happen for these cached tags. An expired item in the cache can still be usable if the item has not changed at all. For a state machine, its tags can change over time for each execution, while for a specific execution of a state machine (execution_arn), I am wondering what kind of change can happen to its tags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tags are actually on the state machines. So unless tags on these state machines are changing, SF logs' tags won't change. The TTL of 300 seconds is to make sure that the tags are somewhat fresh. I'm not sure if I answered your questions, we can talk about it in the standup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Latest changes are:
|
|
I have verified that after the refactor, logs are forwarded and their tags sent correctly. |

What does this PR do?
resourcegroupstaggingapiapi and attach these tags to logs.DD_FETCH_STEP_FUNCTIONS_TAGSflag is added with default totrue.DEPLOY_TO_SERVERLESS_SANDBOXenv var flag is add forinstallation_test.shto deploy to Serverless sandbox account.datadog-cloudformation-template-serverless-sandboxanddd-lambda-signing-bucket-serverless-sandbox.Motivation
logs-to-tracesproject build Step Functions traces from aws logs. To get theenvtag, we'd like to fetch Step Function tags on the forwarder and send these tags with logs to the logs intake.logs-to-tracesreducer will then pick up these tags and label traces with the correctenv.https://datadoghq.atlassian.net/browse/SLS-2718
Testing Guidelines
Tested in Serverless sandbox account by running
./installation_test.shwith stack deletion line commented out.env:staging123andkimi_test:kimi-test(tags are labeled on the testing state machine)Additional Notes
resourcegroupstaggingapi API doc
Testing Forwarder Logs
{ "PaginationToken": "", "ResourceTagMappingList": [ { "ResourceARN": "arn:aws:states:sa-east-1:425362996713:stateMachine:logs-to-traces-complicated-state-machine", "Tags": [ { "Key": "KIMI_TEST", "Value": "kimi-test" }, { "Key": "ENV", "Value": "staging123" } ] } ], "ResponseMetadata": { "RequestId": "333d6c49-0508-44d5-9a43-8940e28b0554", "HTTPStatusCode": 200, "HTTPHeaders": { "x-amzn-requestid": "333d6c49-0508-44d5-9a43-8940e28b0554", "content-type": "application/x-amz-json-1.1", "content-length": "243", "date": "Tue, 15 Nov 2022 16:24:54 GMT" }, "RetryAttempts": 0 } }Types of changes
Check all that apply